process_historical_market_breadth.py - ChartsMaze EDL Pipeline

Overview

process_historical_market_breadth.py calculates day-by-day market breadth indicators across the entire stock universe, generating a historical time-series dataset for the Market Breadth Dashboard charts.

Pipeline Position: Phase 4 - Historical analytics generationCritical Function: Powers breadth trend charts with 250 days of advance/decline, SMA breadth, and momentum indicators

Purpose

This script:

Processes 250 trading days of historical OHLCV data for all tracked stocks
Calculates daily breadth metrics (advances, declines, SMA breadth, etc.)
Merges stock breadth with major index price data
Outputs a CSV file in a specific row-based format for dashboard consumption

Input Files

all_stocks_fundamental_analysis.json

JSON

required

Master stock list to determine which symbols to process

ohlcv_data/*.csv

CSV

required

Individual stock OHLCV files with columns: Date, Open, High, Low, Close, Volume

indices_ohlcv_data/NIFTY.csv

CSV

required

Nifty 50 OHLCV data used to establish the master timeline (last 250 trading days)

indices_ohlcv_data/*.csv

CSV

required

Index OHLCV files for:

NIFTY_MIDCAP_150.csv
NIFTY_SMALLCAP_250.csv
NIFTY_MIDSMALLCAP_400.csv
NIFTY_500.csv

Output Files

market_breadth.csv

CSV

Row-based CSV with each metric as a row and dates as columnsFormat:

Type of Info,2025-05-15,2025-05-16,2025-05-17,...
Up by 4% Today,23,45,12,...
Down by 4% Today,8,15,5,...
5 Day Ratio,1.45,1.52,1.38,...
Above 200MA %,68.5,69.2,70.1,...
Nifty 50,22450.30,22523.15,22601.80,...

market_breadth.json.gz

JSON (gzipped)

Compressed JSON version of the breadth data (currently placeholder in code)

Processing Logic

1. Master Timeline Establishment

Uses Nifty 50’s last 250 trading days as the reference timeline:

LOOKBACK_DAYS = 250

nifty_path = os.path.join(INDEX_OHLCV_DIR, "NIFTY.csv")
nifty_df = pd.read_csv(nifty_path)
timeline = nifty_df['Date'].tail(LOOKBACK_DAYS).tolist()
date_to_idx = {date: i for i, date in enumerate(timeline)}
num_days = len(timeline)

2. Breadth Matrices Initialization

Creates NumPy arrays for efficient metric storage:

# Matrices to store daily flags (Rows=Days, Cols=Stocks)
advances = np.zeros(num_days)
declines = np.zeros(num_days)
above_200ma = np.zeros(num_days)
above_50ma = np.zeros(num_days)
above_20ma = np.zeros(num_days)
above_10ma = np.zeros(num_days)
up_4pc = np.zeros(num_days)
down_4pc = np.zeros(num_days)
high_52w = np.zeros(num_days)
low_52w = np.zeros(num_days)
vol_plus = np.zeros(num_days)
vol_minus = np.zeros(num_days)

3. Stock-Level Processing

For each stock, calculates technical indicators and updates daily counters:

for csv_path in csv_files:
    symbol = os.path.basename(csv_path).replace(".csv", "")
    if symbol not in valid_symbols: continue
    
    # Re-read full history for technicals to avoid edge effects
    full_df = pd.read_csv(csv_path)
    full_df['SMA_10'] = full_df['Close'].rolling(10).mean()
    full_df['SMA_20'] = full_df['Close'].rolling(20).mean()
    full_df['SMA_50'] = full_df['Close'].rolling(50).mean()
    full_df['SMA_200'] = full_df['Close'].rolling(200).mean()
    full_df['Vol_SMA_20'] = full_df['Volume'].rolling(20).mean()
    full_df['H_52W'] = full_df['High'].rolling(252).max()
    full_df['L_52W'] = full_df['Low'].rolling(252).min()
    full_df['Prev_Close'] = full_df['Close'].shift(1)
    full_df['Daily_Ret'] = ((full_df['Close'] - full_df['Prev_Close']) / full_df['Prev_Close']) * 100

    # Filter back to timeline
    analysis_df = full_df[full_df['Date'].isin(timeline)]
    
    for _, row in analysis_df.iterrows():
        idx = date_to_idx.get(row['Date'])
        if idx is None: continue
        
        # Metrics Calculation
        if row['Close'] > row['Prev_Close']: advances[idx] += 1
        if row['Close'] < row['Prev_Close']: declines[idx] += 1
        
        if row['Close'] > row['SMA_200']: above_200ma[idx] += 1
        if row['Close'] > row['SMA_50']: above_50ma[idx] += 1
        if row['Close'] > row['SMA_20']: above_20ma[idx] += 1
        if row['Close'] > row['SMA_10']: above_10ma[idx] += 1
        
        if row['Daily_Ret'] >= 4: up_4pc[idx] += 1
        if row['Daily_Ret'] <= -4: down_4pc[idx] += 1
        
        if row['High'] >= row['H_52W']: high_52w[idx] += 1
        if row['Low'] <= row['L_52W']: low_52w[idx] += 1
        
        if row['Volume'] > row['Vol_SMA_20']: vol_plus[idx] += 1
        else: vol_minus[idx] += 1

4. Advance/Decline Ratio Calculation

Calculates rolling A/D ratios:

def calc_ratio(adv, dec, window):
    r = []
    for i in range(len(adv)):
        start = max(0, i - window + 1)
        sum_adv = sum(adv[start:i+1])
        sum_dec = sum(dec[start:i+1])
        ratio = round(sum_adv / sum_dec, 2) if sum_dec > 0 else 1.0
        r.append(ratio)
    return r

rows.append(to_csv_row("5 Day Ratio", calc_ratio(advances, declines, 5)))
rows.append(to_csv_row("10 Day Ratio", calc_ratio(advances, declines, 10)))

5. CSV Assembly

Assembles the final CSV in row-based format:

rows = []
rows.append("Type of Info," + ",".join(timeline))

# Momentum Indicators
rows.append(to_csv_row("Up by 4% Today", up_4pc.astype(int)))
rows.append(to_csv_row("Down by 4% Today", down_4pc.astype(int)))

# A/D Ratios
rows.append(to_csv_row("5 Day Ratio", calc_ratio(advances, declines, 5)))
rows.append(to_csv_row("10 Day Ratio", calc_ratio(advances, declines, 10)))

# Breadth Percentages
total_tracked = max(processed_count, 1)
rows.append(to_csv_row("Above 200MA %", np.round(above_200ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 50MA %", np.round(above_50ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 20MA %", np.round(above_20ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 10MA %", np.round(above_10ma / total_tracked * 100, 1)))

# 52-Week Extremes
rows.append(to_csv_row("Reached 52w High", high_52w.astype(int)))
rows.append(to_csv_row("Reached 52w Low", low_52w.astype(int)))

# Volume
rows.append(to_csv_row("Volume greater than 20Day Average", vol_plus.astype(int)))
rows.append(to_csv_row("Volume less than 20Day Average", vol_minus.astype(int)))

# Raw Counts
rows.append(to_csv_row("Advances", advances.astype(int)))
rows.append(to_csv_row("Declines", declines.astype(int)))

# Index Prices
for label, prices in index_data.items():
    rows.append(to_csv_row(label, prices))

Output Metrics

Momentum Indicators

Up by 4% Today

integer[]

Daily count of stocks with +4% or greater return

Down by 4% Today

integer[]

Daily count of stocks with -4% or worse return

Advance/Decline Ratios

5 Day Ratio

float[]

5-day rolling advance/decline ratio

Values > 1.0 indicate bullish breadth
Values < 1.0 indicate bearish breadth

10 Day Ratio

float[]

10-day rolling advance/decline ratio

Moving Average Breadth

Above 200MA %

float[]

Percentage of stocks trading above their 200-day SMA (daily)

Above 50MA %

float[]

Percentage of stocks trading above their 50-day SMA (daily)

Above 20MA %

float[]

Percentage of stocks trading above their 20-day SMA (daily)

Above 10MA %

float[]

Percentage of stocks trading above their 10-day SMA (daily)

52-Week Extremes

Reached 52w High

integer[]

Daily count of stocks hitting new 52-week highs

Reached 52w Low

integer[]

Daily count of stocks hitting new 52-week lows

Volume Metrics

Volume greater than 20Day Average

integer[]

Count of stocks with above-average volume

Volume less than 20Day Average

integer[]

Count of stocks with below-average volume

Index Prices

Nifty 50

float[]

Daily closing prices for Nifty 50

Nifty 500

float[]

Daily closing prices for Nifty 500

Nifty Midcap 150

float[]

Daily closing prices for Nifty Midcap 150

Nifty Smallcap 250

float[]

Daily closing prices for Nifty Smallcap 250

Nifty Midsmallcap 400

float[]

Daily closing prices for Nifty Midsmallcap 400

Usage Example

python process_historical_market_breadth.py

Expected Output:

⏳ Loading master stock list...
Targeting 2847 stocks for historical breadth.
🧬 Processing stock-level history...
✅ Analyzed 2847 stocks. Merging with Index data...
🚀 Market Breadth Historical Data generated: /path/to/market_breadth.csv

Performance Optimization

Uses NumPy arrays for memory efficiency with large datasets
Processes full history once per stock to calculate technical indicators correctly
Filters to timeline only for final analysis to reduce computation
Avoids edge effects by using full historical data for rolling calculations

Data Quality Notes

SMA Edge Effects Prevention: The script reads the full historical CSV for each stock to calculate SMAs properly, then filters to the 250-day timeline. This prevents incorrect SMA values at the beginning of the timeline.

Placeholder Metrics: Some metrics like “Up by 25% in Month” and “Nifty 500 % of W&M RSI > 60” are currently placeholders (zeros) and may be implemented in future versions.

fetch_indices_ohlcv.py - Fetches index OHLCV data required for processing
process_market_breadth.py - Generates current-day sector breadth analytics
bulk_market_analyzer.py - Creates the master stock list

​Overview

​Purpose

​Input Files

​Output Files

​Processing Logic

​1. Master Timeline Establishment

​2. Breadth Matrices Initialization

​3. Stock-Level Processing

​4. Advance/Decline Ratio Calculation

​5. CSV Assembly

​Output Metrics

​Momentum Indicators

​Advance/Decline Ratios

​Moving Average Breadth

​52-Week Extremes

​Volume Metrics

​Index Prices

​Usage Example

​Performance Optimization

​Data Quality Notes

​Related Scripts

Overview

Purpose

Input Files

Output Files

Processing Logic

1. Master Timeline Establishment

2. Breadth Matrices Initialization

3. Stock-Level Processing

4. Advance/Decline Ratio Calculation

5. CSV Assembly

Output Metrics

Momentum Indicators

Advance/Decline Ratios

Moving Average Breadth

52-Week Extremes

Volume Metrics

Index Prices

Usage Example

Performance Optimization

Data Quality Notes

Related Scripts